Loading and Depuring CVS or XLS Files

By Hubert Ronald / Leave a response / May 28, 2018

Loading and depuring data that coming in formats: csv or xls is critical for start to works in pandas.

First defined a helper function for clean with it, index column

import re # string

def renameIndex(as_list):
    for v in range(len(as_lista)):
        as_lista[v] = re.sub("[\(\[].*?[\)\]]", "", as_lista[v])
        as_lista[v]=''.join([i for i in as_lista[v] if not i.isdigit()])
        as_lista[v]=as_lista[v].strip()             #If you want to remove leading and ending spaces, use str.strip():
        as_lista[v]=" ".join(as_lista[v].split())   #If you want to remove duplicated spaces, use str.split():
    
    return as_lista

Download file from The World Bank and after, can use pandas

import pandas as pd

GDP = pd.read_csv('API_NY.GDP.MKTP.CD_DS2_en_excel_v2_9943054.csv' , index_col=0, skiprows=4, delimiter=',', encoding="utf-8-sig")
as_lista = GDP.index.tolist()
GDP.index = renameIndex(as_lista) # Helper Function

Index is clean!





Hubert Ronald

Hiya. I'm Ronald, an Industrial Engineer from Colombia. Hope you enjoyed my ad-free, bullshit-free site. If you liked it, tell someone about it.

Write a response

Your email address will not be published.

RECENT NEWS

Gradient is a small library which offers a clean, minimalistic but powerful API for gradients on mesh canvas when you use it on Gideros Mobile. This script take colors from uiGradients like inspiration.

More information about before here.

SUBSCRIBE

Get monthly updates and free resources.


CONNECT WITH ME